課程大綱

課程資訊

課程名稱

資料科學與社會研究
Data Science and Social Inquiry

開課學期

112-1

授課對象

社會科學院經濟學研究所

授課教師

陳由常

課號

ECON5166

課程識別碼

323 U1250

班次

學分

3.0

全/半年

半年

必/選修

選修

上課時間

星期三6,7,8(13:20~16:20)

上課地點

社科506

備註

「資料科學與社會分析學士班跨域專長」必修課。
限學士班三年級以上或限碩士班以上或限博士班
總人數上限：60人

課程簡介影片

核心能力關聯

核心能力與課程規劃關聯圖

課程大綱

為確保您我的權利,請尊重智慧財產權及不得非法影印

課程概述

Please check

https://docs.google.com/document/d/1Va_CnqUgMtGCAO6hRUENvu4F7j2KTsWAXSBxKP0OZ0M/edit?usp=sharing

for detail information. Below is a problem set that helps you decide whether you are ready for this course (draft version, don't write yet)

https://drive.google.com/file/d/1fWoYhHQmbVyyyupOJQp73sZDZRD_PZsK/view?usp=sharing

---

Econ 5166 serves as an introduction to “classical” machine learning (ML) methods such as PCA, LASSO, decision trees, random forests, and more, with a strong focus on their practical applications in social science research and business. This course is designed for students who have already completed an initial course in statistics, have some hands-on data manipulation experience, and are keen to delve into the underlying principles of machine learning.

Despite the myriad of excellent ML courses available at NTU, Econ 5166 stands apart due to two distinctive aspects. Firstly, the course emphasizes the underlying relationship between ML and statistics. It deciphers how ML, like any data-based exploratory technique, fits into the broader statistical framework. The connections between fundamental statistical concepts and classical ML methods—correlation and PCA, OLS regression and LASSO, hypothesis testing and classification, to name a few—will be illustrated. This class also serves as an opportunity to revisit statistics by exploring its core concepts (like correlation, expectations) in light of real-world applications. It's worth noting that given our focus on understanding statistical underpinnings rather than merely the methodology, we will primarily concentrate on the more traditional, accessible ML methods. Modern methods like deep learning will be conceptually addressed as an extension of what we will actually learn in class.

The second distinctive feature of this course involves a in-depth exploration of ML applications in social science research, and to a lesser extent, business. Our primary goal is to equip you with practical ML skills to tackle real-world challenges effectively. To achieve this, each method we discuss will be motivated by business applications, followed by an analysis of a research paper or one of my own research projects to demonstrate the relevance of ML. Additionally, an integral part of this course is a project assignment where you'll refine your skills in problem formulation, coding for data analysis, precise interpretation of statistical results, and effective communication of your findings. This hands-on approach not only solidifies your theoretical understanding but also enhances your ability to use ML methods in practical, real-world scenarios.

課程目標

1. Developing working knowledge about machine learning methods: Students will learn how to intuitively understand the principles of various machine learning methods through mathematical definitions and algorithms. Furthermore, they will be able to apply this knowledge in actual data analysis work, such as feature selection and interpreting analysis results.

2. Understanding machine learning algorithms through statistics: Students will utilize basic concepts such as conditional expectation to grasp the statistical implications of these algorithms (for example, cross-validation). Simultaneously, this course also emphasizes how to lead students to re-understand basic statistical concepts like correlation, regression analysis, hypothesis testing, etc., from a practical application perspective.

3. Cultivating data processing skills: Students will learn a series of data processing skills, including data cleaning, ETL (extract, transform, load), web crawling, data visualization, to application development of data products, and the verification of data reliability and the inspection of potential errors in the analysis process.

4. Fostering basic literacy in data science: Students will cultivate essential abilities for a data scientist, such as enhancing mathematical maturity and mastery of statistics, and refine their scientific problem-solving method in the final project. At the same time, they will learn how to apply data (science) in a business environment and have a preliminary understanding of the division of labor and required skills for various job functions.

課程要求

1. Homework
2. Midterm
3. Final Project Presentation

預期每週課後學習時數

Office Hours

指定閱讀

待補

參考書目

Murphy (2022), Probabilistic Machine Learning: An Introduction

評量方式
(僅供參考)

針對學生困難提供學生調整方式

上課形式	以錄影輔助
作業繳交方式
考試形式
其他

課程進度

週次	日期	單元主題
第1週	9/06	Introduction
第2週	9/13	Principal Component Analysis
第3週	9/20	Principal Component Analysis
第4週	9/27	Factor analysis
第5週	10/04	Clustering
第6週	10/11	Clustering
第7週	10/18	Project Discussion (No Class)
第8週	10/25	Penalized Regression
第9週	11/01	Penalized Regression
第10週	11/08	Penalized Regression
第11週	11/15	Midterm
第12週	11/22	Project Discussion (No Class)
第13週	11/29	Tree Algorithms
第14週	12/06	Tree Algorithms
第15週	12/13	Tree Algorithms
第16週	12/20	Final Project Rehearsal (No Class/ Graded)
第17週	12/27	Project Presentation